Overview

Dataset statistics

Number of variables13
Number of observations299
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory30.5 KiB
Average record size in memory104.4 B

Variable types

Numeric7
Categorical6

Alerts

time is highly correlated with DEATH_EVENTHigh correlation
DEATH_EVENT is highly correlated with timeHigh correlation
time is highly correlated with DEATH_EVENTHigh correlation
DEATH_EVENT is highly correlated with timeHigh correlation
age is highly correlated with diabetes and 2 other fieldsHigh correlation
anaemia is highly correlated with diabetes and 4 other fieldsHigh correlation
creatinine_phosphokinase is highly correlated with high_blood_pressure and 3 other fieldsHigh correlation
diabetes is highly correlated with age and 5 other fieldsHigh correlation
ejection_fraction is highly correlated with high_blood_pressure and 3 other fieldsHigh correlation
high_blood_pressure is highly correlated with anaemia and 5 other fieldsHigh correlation
platelets is highly correlated with sex and 2 other fieldsHigh correlation
serum_creatinine is highly correlated with sex and 1 other fieldsHigh correlation
serum_sodium is highly correlated with sex and 2 other fieldsHigh correlation
sex is highly correlated with age and 9 other fieldsHigh correlation
smoking is highly correlated with age and 9 other fieldsHigh correlation
time is highly correlated with DEATH_EVENTHigh correlation
DEATH_EVENT is highly correlated with anaemia and 8 other fieldsHigh correlation
ejection_fraction is highly correlated with serum_creatinine and 1 other fieldsHigh correlation
serum_creatinine is highly correlated with ejection_fractionHigh correlation
sex is highly correlated with smokingHigh correlation
smoking is highly correlated with sexHigh correlation
time is highly correlated with DEATH_EVENTHigh correlation
DEATH_EVENT is highly correlated with ejection_fraction and 1 other fieldsHigh correlation

Reproduction

Analysis started2021-11-17 00:27:15.764230
Analysis finished2021-11-17 00:27:35.851358
Duration20.09 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct47
Distinct (%)15.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean60.83389298
Minimum40
Maximum95
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2021-11-16T21:27:35.965652image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile42.9
Q151
median60
Q370
95-th percentile82
Maximum95
Range55
Interquartile range (IQR)19

Descriptive statistics

Standard deviation11.89480907
Coefficient of variation (CV)0.1955293093
Kurtosis-0.184870532
Mean60.83389298
Median Absolute Deviation (MAD)10
Skewness0.4230619067
Sum18189.334
Variance141.4864829
MonotonicityNot monotonic
2021-11-16T21:27:36.108603image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=47)
ValueCountFrequency (%)
6033
 
11.0%
5027
 
9.0%
6526
 
8.7%
7025
 
8.4%
4519
 
6.4%
5517
 
5.7%
7511
 
3.7%
5310
 
3.3%
5810
 
3.3%
638
 
2.7%
Other values (37)113
37.8%
ValueCountFrequency (%)
407
 
2.3%
411
 
0.3%
427
 
2.3%
431
 
0.3%
442
 
0.7%
4519
6.4%
463
 
1.0%
471
 
0.3%
482
 
0.7%
494
 
1.3%
ValueCountFrequency (%)
952
 
0.7%
941
 
0.3%
903
1.0%
871
 
0.3%
861
 
0.3%
856
2.0%
823
1.0%
811
 
0.3%
807
2.3%
791
 
0.3%

anaemia
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
0
170 
1
129 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0170
56.9%
1129
43.1%

Length

2021-11-16T21:27:36.238420image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-16T21:27:36.300945image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0170
56.9%
1129
43.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

creatinine_phosphokinase
Real number (ℝ≥0)

HIGH CORRELATION

Distinct208
Distinct (%)69.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean581.8394649
Minimum23
Maximum7861
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2021-11-16T21:27:36.382289image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum23
5-th percentile59
Q1116.5
median250
Q3582
95-th percentile2263
Maximum7861
Range7838
Interquartile range (IQR)465.5

Descriptive statistics

Standard deviation970.2878807
Coefficient of variation (CV)1.667621293
Kurtosis25.1490462
Mean581.8394649
Median Absolute Deviation (MAD)182
Skewness4.463110085
Sum173970
Variance941458.5715
MonotonicityNot monotonic
2021-11-16T21:27:36.538514image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
58247
 
15.7%
664
 
1.3%
1294
 
1.3%
2313
 
1.0%
693
 
1.0%
683
 
1.0%
843
 
1.0%
1153
 
1.0%
593
 
1.0%
603
 
1.0%
Other values (198)223
74.6%
ValueCountFrequency (%)
231
 
0.3%
301
 
0.3%
473
1.0%
521
 
0.3%
531
 
0.3%
541
 
0.3%
551
 
0.3%
562
0.7%
571
 
0.3%
581
 
0.3%
ValueCountFrequency (%)
78611
0.3%
77021
0.3%
58821
0.3%
52091
0.3%
45401
0.3%
39661
0.3%
39641
0.3%
27941
0.3%
26951
0.3%
26561
0.3%

diabetes
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
0
174 
1
125 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
0174
58.2%
1125
41.8%

Length

2021-11-16T21:27:36.678737image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-16T21:27:36.751010image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0174
58.2%
1125
41.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ejection_fraction
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct17
Distinct (%)5.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.08361204
Minimum14
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2021-11-16T21:27:36.822068image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile20
Q130
median38
Q345
95-th percentile60
Maximum80
Range66
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.83484074
Coefficient of variation (CV)0.3107594082
Kurtosis0.04140935982
Mean38.08361204
Median Absolute Deviation (MAD)8
Skewness0.5553827517
Sum11387
Variance140.0634554
MonotonicityNot monotonic
2021-11-16T21:27:36.952241image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
3549
16.4%
3840
13.4%
4037
12.4%
2536
12.0%
3034
11.4%
6031
10.4%
5021
7.0%
4520
6.7%
2018
 
6.0%
553
 
1.0%
Other values (7)10
 
3.3%
ValueCountFrequency (%)
141
 
0.3%
152
 
0.7%
172
 
0.7%
2018
 
6.0%
2536
12.0%
3034
11.4%
3549
16.4%
3840
13.4%
4037
12.4%
4520
6.7%
ValueCountFrequency (%)
801
 
0.3%
701
 
0.3%
651
 
0.3%
622
 
0.7%
6031
10.4%
553
 
1.0%
5021
7.0%
4520
6.7%
4037
12.4%
3840
13.4%

high_blood_pressure
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
0
194 
1
105 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0194
64.9%
1105
35.1%

Length

2021-11-16T21:27:37.124004image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-16T21:27:37.271495image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0194
64.9%
1105
35.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

platelets
Real number (ℝ≥0)

HIGH CORRELATION

Distinct176
Distinct (%)58.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean263358.0293
Minimum25100
Maximum850000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2021-11-16T21:27:37.353003image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum25100
5-th percentile131800
Q1212500
median262000
Q3303500
95-th percentile422500
Maximum850000
Range824900
Interquartile range (IQR)91000

Descriptive statistics

Standard deviation97804.23687
Coefficient of variation (CV)0.3713736663
Kurtosis6.209254515
Mean263358.0293
Median Absolute Deviation (MAD)44000
Skewness1.462320838
Sum78744050.75
Variance9565668749
MonotonicityNot monotonic
2021-11-16T21:27:37.511751image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
263358.0325
 
8.4%
2210004
 
1.3%
2790004
 
1.3%
2710004
 
1.3%
3050004
 
1.3%
2260004
 
1.3%
2280004
 
1.3%
2350004
 
1.3%
2370004
 
1.3%
2550004
 
1.3%
Other values (166)238
79.6%
ValueCountFrequency (%)
251001
0.3%
470001
0.3%
510001
0.3%
620001
0.3%
700001
0.3%
730001
0.3%
750001
0.3%
870001
0.3%
1050001
0.3%
1190001
0.3%
ValueCountFrequency (%)
8500001
0.3%
7420001
0.3%
6210001
0.3%
5430001
0.3%
5330001
0.3%
5070001
0.3%
5040001
0.3%
4970001
0.3%
4810001
0.3%
4610001
0.3%

serum_creatinine
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct40
Distinct (%)13.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.393879599
Minimum0.5
Maximum9.4
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2021-11-16T21:27:37.654929image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0.5
5-th percentile0.7
Q10.9
median1.1
Q31.4
95-th percentile3
Maximum9.4
Range8.9
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation1.034510064
Coefficient of variation (CV)0.7421803613
Kurtosis25.82823866
Mean1.393879599
Median Absolute Deviation (MAD)0.2
Skewness4.455995882
Sum416.77
Variance1.070211073
MonotonicityNot monotonic
2021-11-16T21:27:37.780557image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
150
16.7%
1.132
10.7%
0.932
10.7%
1.224
 
8.0%
0.824
 
8.0%
1.320
 
6.7%
0.719
 
6.4%
1.1811
 
3.7%
1.49
 
3.0%
1.79
 
3.0%
Other values (30)69
23.1%
ValueCountFrequency (%)
0.51
 
0.3%
0.64
 
1.3%
0.719
 
6.4%
0.751
 
0.3%
0.824
8.0%
0.932
10.7%
150
16.7%
1.132
10.7%
1.1811
 
3.7%
1.224
8.0%
ValueCountFrequency (%)
9.41
0.3%
91
0.3%
6.81
0.3%
6.11
0.3%
5.81
0.3%
51
0.3%
4.41
0.3%
41
0.3%
3.81
0.3%
3.71
0.3%

serum_sodium
Real number (ℝ≥0)

HIGH CORRELATION

Distinct27
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean136.6254181
Minimum113
Maximum148
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2021-11-16T21:27:38.029141image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum113
5-th percentile130
Q1134
median137
Q3140
95-th percentile144
Maximum148
Range35
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.412477284
Coefficient of variation (CV)0.03229616675
Kurtosis4.119712008
Mean136.6254181
Median Absolute Deviation (MAD)3
Skewness-1.048136016
Sum40851
Variance19.46995578
MonotonicityNot monotonic
2021-11-16T21:27:38.138149image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
13640
13.4%
13738
12.7%
14035
11.7%
13432
10.7%
13823
7.7%
13922
 
7.4%
13516
 
5.4%
13214
 
4.7%
14112
 
4.0%
14211
 
3.7%
Other values (17)56
18.7%
ValueCountFrequency (%)
1131
 
0.3%
1161
 
0.3%
1211
 
0.3%
1241
 
0.3%
1251
 
0.3%
1261
 
0.3%
1273
 
1.0%
1282
 
0.7%
1292
 
0.7%
1309
3.0%
ValueCountFrequency (%)
1481
 
0.3%
1461
 
0.3%
1459
 
3.0%
1445
 
1.7%
1433
 
1.0%
14211
 
3.7%
14112
 
4.0%
14035
11.7%
13922
7.4%
13823
7.7%

sex
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
1
194 
0
105 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
1194
64.9%
0105
35.1%

Length

2021-11-16T21:27:38.250711image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-16T21:27:38.317697image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
1194
64.9%
0105
35.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

smoking
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
0
203 
1
96 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0203
67.9%
196
32.1%

Length

2021-11-16T21:27:38.389545image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-16T21:27:38.470319image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0203
67.9%
196
32.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

time
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct148
Distinct (%)49.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean130.2608696
Minimum4
Maximum285
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2021-11-16T21:27:38.611107image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile12.9
Q173
median115
Q3203
95-th percentile250
Maximum285
Range281
Interquartile range (IQR)130

Descriptive statistics

Standard deviation77.61420795
Coefficient of variation (CV)0.5958367099
Kurtosis-1.212047967
Mean130.2608696
Median Absolute Deviation (MAD)71
Skewness0.1278026456
Sum38948
Variance6023.965276
MonotonicityIncreasing
2021-11-16T21:27:38.757713image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2507
 
2.3%
1877
 
2.3%
106
 
2.0%
1866
 
2.0%
1076
 
2.0%
305
 
1.7%
2095
 
1.7%
2445
 
1.7%
955
 
1.7%
2145
 
1.7%
Other values (138)242
80.9%
ValueCountFrequency (%)
41
 
0.3%
61
 
0.3%
72
 
0.7%
82
 
0.7%
106
2.0%
112
 
0.7%
121
 
0.3%
131
 
0.3%
142
 
0.7%
152
 
0.7%
ValueCountFrequency (%)
2851
 
0.3%
2801
 
0.3%
2781
 
0.3%
2711
 
0.3%
2702
 
0.7%
2582
 
0.7%
2571
 
0.3%
2562
 
0.7%
2507
2.3%
2471
 
0.3%

DEATH_EVENT
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
0
203 
1
96 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0203
67.9%
196
32.1%

Length

2021-11-16T21:27:38.882962image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-16T21:27:38.946534image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0203
67.9%
196
32.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2021-11-16T21:27:34.435910image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:19.581232image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:29.774140image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:30.717203image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:32.067832image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:32.838501image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:33.634544image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:34.555950image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:29.030195image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:29.946222image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:30.832424image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:32.169968image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:32.939258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:33.747654image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:34.676726image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:29.201949image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:30.085486image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:30.962015image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:32.293989image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:33.080394image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:33.874526image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:34.804481image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:29.325545image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:30.210549image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:31.627996image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:32.392093image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:33.190599image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:33.993068image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:35.039613image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:29.434197image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:30.343376image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:31.733537image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:32.516447image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:33.294207image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:34.102693image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:35.156429image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:29.544502image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:30.459227image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:31.839089image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:32.611935image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:33.391493image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:34.209685image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:35.261879image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:29.666183image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:30.587765image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:31.956012image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:32.729483image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:33.503320image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-11-16T21:27:34.324270image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2021-11-16T21:27:39.024627image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-11-16T21:27:39.213640image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-11-16T21:27:39.409880image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-11-16T21:27:39.617717image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-11-16T21:27:39.764362image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-11-16T21:27:35.482674image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-11-16T21:27:35.730973image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

ageanaemiacreatinine_phosphokinasediabetesejection_fractionhigh_blood_pressureplateletsserum_creatinineserum_sodiumsexsmokingtimeDEATH_EVENT
075.005820201265000.001.91301041
155.0078610380263358.031.11361061
265.001460200162000.001.31291171
350.011110200210000.001.91371071
465.011601200327000.002.71160081
590.01470401204000.002.11321181
675.012460150127000.001.213710101
760.013151600454000.001.113111101
865.001570650263358.031.513800101
980.011230351388000.009.413311101

Last rows

ageanaemiacreatinine_phosphokinasediabetesejection_fractionhigh_blood_pressureplateletsserum_creatinineserum_sodiumsexsmokingtimeDEATH_EVENT
28990.013370380390000.00.9144002560
29045.006151550222000.00.8141002570
29160.003200350133000.01.4139102580
29252.001901380382000.01.0140112580
29363.011031350179000.00.9136112700
29462.00611381155000.01.1143112700
29555.0018200380270000.01.2139002710
29645.0020601600742000.00.8138002780
29745.0024130380140000.01.4140112800
29850.001960450395000.01.6136112850